Skip to content

Conversation

@nikita-smetanin
Copy link

  • Add support for Multimodal datasets in OpenAI-like format
  • Add support for Vision-Language model training with optional Vision encoder finetuning

elif messages_are_multimodal != is_multimodal:
# Due to the format limitation, we cannot mix multimodal and text only messages in the same sample.
raise InvalidFileFormatError(
"Messages in the conversation must be either all in multimodal or all intext only format.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: typo: ...or all in text-only

Comment on lines +273 to +274
message: The message to check.
idx: Line number in the file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update these

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

certainly dont mind this change but i wonder how it got in


if model_limits.supports_vision:
# Don't show price estimation for multimodal models yet
confirm = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, i don't have context here, why does this prevent showing the price estimation?


if model_limits.supports_vision:
multimodal_params = FinetuneMultimodalParams(train_vision=train_vision)
elif train_vision:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

supernit: i prefer
elif not model_limits.supports_vision and train_vision
here. it's logically the same, but the condition is clearer

line_number=idx + 1,
error_source="key_value",
)
if not isinstance(message[column], str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps you can check isinstance(message[column], MessageContent) instead?

def _check_message_role(
message: Dict[str, str | bool], previous_role: str | None, idx: int
) -> str | bool:
message: Dict[str, str | int | MessageContent], previous_role: str | None, idx: int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when is the message an int?



def _check_message_content(
message_content: str | int | MessageContent, role: str, idx: int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when is the message an int?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants